19 research outputs found

    HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection

    Get PDF
    Besides standard cameras, autonomous vehicles typically include multipleadditional sensors, such as lidars and radars, which help acquire richerinformation for perceiving the content of the driving scene. While severalrecent works focus on fusing certain pairs of sensors - such as camera andlidar or camera and radar - by using architectural components specific to theexamined setting, a generic and modular sensor fusion architecture is missingfrom the literature. In this work, we focus on 2D object detection, afundamental high-level task which is defined on the 2D image domain, andpropose HRFuser, a multi-resolution sensor fusion architecture that scalesstraightforwardly to an arbitrary number of input modalities. The design ofHRFuser is based on state-of-the-art high-resolution networks for image-onlydense prediction and incorporates a novel multi-window cross-attention block asthe means to perform fusion of multiple modalities at multiple resolutions.Even though cameras alone provide very informative features for 2D detection,we demonstrate via extensive experiments on the nuScenes and Seeing Through Fogdatasets that our model effectively leverages complementary features fromadditional modalities, substantially improving upon camera-only performance andconsistently outperforming state-of-the-art fusion methods for 2D detectionboth in normal and adverse conditions. The source code will be made publiclyavailable.<br

    Fog Simulation on Real {LiDAR} Point Clouds for {3D} Object Detection in Adverse Weather

    Get PDF

    AutoSimulate: (Quickly) Learning Synthetic Data Generation

    Full text link
    Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and validation pipeline as a black-box and require multiple costly objective evaluations at each iteration. We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective. This allows us to optimize the simulator, which may be non-differentiable, requiring only one objective evaluation at each iteration with a little overhead. We demonstrate on a state-of-the-art photorealistic renderer that the proposed method finds the optimal data distribution faster (up to 50Ă—50\times), with significantly reduced training data generation (up to 30Ă—30\times) and better accuracy (+8.7%+8.7\%) on real-world test datasets than previous methods.Comment: ECCV 202

    Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding

    Full text link
    This work addresses the problem of semantic scene understanding under dense fog. Although considerable progress has been made in semantic scene understanding, it is mainly related to clear-weather scenes. Extending recognition methods to adverse weather conditions such as fog is crucial for outdoor applications. In this paper, we propose a novel method, named Curriculum Model Adaptation (CMAda), which gradually adapts a semantic segmentation model from light synthetic fog to dense real fog in multiple steps, using both synthetic and real foggy data. In addition, we present three other main stand-alone contributions: 1) a novel method to add synthetic fog to real, clear-weather scenes using semantic input; 2) a new fog density estimator; 3) the Foggy Zurich dataset comprising 38083808 real foggy images, with pixel-level semantic annotations for 1616 images with dense fog. Our experiments show that 1) our fog simulation slightly outperforms a state-of-the-art competing simulation with respect to the task of semantic foggy scene understanding (SFSU); 2) CMAda improves the performance of state-of-the-art models for SFSU significantly by leveraging unlabeled real foggy data. The datasets and code are publicly available.Comment: final version, ECCV 201

    One-Shot Unsupervised Cross-Domain Detection

    Get PDF
    Despite impressive progress in object detection over the last years, it is still an open challenge to reliably detect objects across visual domains. Although the topic has attracted attention recently, current approaches all rely on the ability to access a sizable amount of target data for use at training time. This is a heavy assumption, as often it is not possible to anticipate the domain where a detector will be used, nor to access it in advance for data acquisition. Consider for instance the task of monitoring image feeds from social media: as every image is created and uploaded by a different user it belongs to a different target domain that is impossible to foresee during training. This paper addresses this setting, presenting an object detection algorithm able to perform unsupervised adaption across domains by using only one target sample, seen at test time. We achieve this by introducing a multi-task architecture that one-shot adapts to any incoming sample by iteratively solving a self-supervised task on it. We further enhance this auxiliary adaptation with cross-task pseudo-labeling. A thorough benchmark analysis against the most recent cross-domain detection methods and a detailed ablation study show the advantage of our method, which sets the state-of-the-art in the defined one-shot scenario

    Region Proposal Oriented Approach for Domain Adaptive Object Detection

    No full text
    International audienceFaster R-CNN has become a standard model in deep-learning based object detection. However, in many cases, few annotations are available for images in the application domain referred as the target domain whereas full annotations are available for closely related public or synthetic datasets referred as source domains. Thus, a domain adaptation is needed to be able to train a model performing well in the target domain with few or no annotations in this target domain. In this work, we address this domain adaptation problem in the context of object detection in the case where no annotations are available in the target domain. Most existing approaches consider adaptation at both global and instance level but without adapting the region proposal sub-network leading to a residual domain shift. After a detailed analysis of the classical Faster R-CNN detector , we show that adapting the region proposal sub-network is crucial and propose an original way to do it. We run experiments in two different application contexts, namely autonomous driving and ski-lift video surveillance, and show that our adaptation scheme clearly outperforms the previous solution
    corecore